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ABSTRACT 

The ability to model and predict the popularity dynamics of 
individual user generated items on online media has impor¬ 
tant implications in a wide range of areas. In this paper, we 
propose a probabilistic model using a Self-Excited Hawkes 
Process (SEHP) to characterize the process through which 
individual microblogs gain their popularity. This model ex¬ 
plicitly captures the triggering effect of each forwarding, dis¬ 
tinguishing itself from the reinforced Poisson process based 
model where all previous forwardings are simply aggregated 
as a single triggering effect. We validate the proposed model 
by applying it on Sina Weibo, the most popular microblog¬ 
ging network in China. Experimental results demonstrate 
that the SEHP model consistently outperforms the model 
based on reinforced Poisson process. 
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1. INTRODUCTION 

With the explosive growth of User Generated Contents 
(UGC) on online media, it becomes an important issue to 
predict the popularity dynamics of UGC items, including 
microblogs, tweets, videos, to name a few. Popularity pre¬ 
diction has important implications in many domains, includ¬ 
ing viral marketing, public opinion monitoring, etc. Early 
studies devote to characterizing the distribution of the pop¬ 
ularity over an aggregation of UGC items [2] and making 
prediction by exploiting temporal correlations [HE]. 
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Recently, researchers began to model the popularity dy¬ 
namics of individual UGC items jail]. However, these mod¬ 
els usually assume an aggregate stochastic process without 
distinguishing the triggering effects of different forwarding 
actions in the diffusion-and-reaction process. Therefore, al¬ 
though these models gain success in predicting, say, the cita¬ 
tion counts of scientific papers and view counts of Youtube 
videos, they are not applicable to model popularity dynam¬ 
ics over a microblogging network, where interactions among 
users matter much in popularity dynamics. 

In this paper, we propose a probabilistic model using a 
Self-Excited Hawkes Process (SEHP) to model the process 
through which individual microblogs gain their popularity. 
This model explicitly captures the triggering effect of each 
forwarding, distinguishing itself from the reinforced Poisson 
process (RPP) based model presented in [4], where all previ¬ 
ous forwardings are simply aggregated as a single triggering 
effect (see Pig. 1). We validate the proposed model by ap¬ 
plying it on Sina WeibcQ, the most popular microblogging 
network in Ghina. Experimental results demonstrate that 
this model consistently outperforms the model based on re¬ 
inforced Poisson process. 

2. THE SEHP MODEL 

When a microblog spreads, it creates a cascade on the 
microblogging network. The popularity dynamics of each 
microblog during observed time period [0, T] can be charac¬ 
terized by a set of time stamps ti (1 < f < N) which denote 
the occurrence time of each forwardings. Here, N is the to¬ 
tal number of forwardings. Without loss of generality, we 
have 0 = to < ti < t 2 < ■■■ < ti < ... < tN < T. For a mi¬ 
croblog, we model its popularity dynamics using an SEHP 
characterized with the following rate function 

\{t)=ve~^^ + a (1) 

f=i 

where v is the initial triggering strength that reflects the 
attractiveness of the microblog, a is the triggering strength 
of each subsequent forwarding, and jmax{t) is the index of 
the last forwarding before time t. We set an exponential 
decaying function with exponent j3 for simplicity. 

According to the survival theory, given that the {i — 1)- 
th forwarding arrives at ti_i, the probability that the i-th 
forwarding arrives at ti follows 

_ p(U|U-i) = (2) 

^http://t.cn 
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Figure 2: Prediction performance 


Figure 1: Differences between SEHP and RPP 


and the probability that no forwarding arrives between tjv 
and T is 

p(T|tjv) = e“^*N (3) 

Assuming that forwardings during different time intervals 
are statistically independent, the likelihood of observing a 
cascade of a microblog and its subsequent forwardings dur¬ 
ing time interval [0, T] follows 

N 

£(a,^,u) =p(T|fjv)]^p(ti|ti-i). (4) 

i=l 

By substituting Eqs. {[TJ, {(21, and m in Eq. 0 , we 
obtain the logarithmic likelihood 

log £(«, /3, u) 1) + ^ -1) + 

N / jmaxiti) \ 

log I -ba 

i=i \ j=i / 

We employ maximum likelihood estimation to infer the 
parameters in the proposed model. With the estimated pa¬ 
rameters, the model can be used to predict the expected 
number c{t) of forwardings of a microblog up to any given 
time t. With the rate function in Eq. o, we obtain the 
prediction function 

jmax(t) 

J = 1 

3. EXPERIMENTAL VALIDATION 

Experiments are conducted on a dataset from Sina Weibo, 
published by the WISE 2012 Challengfl We select mi¬ 
croblogs that were submitted during July 1-31, 2011 and 
have more than 10 forwardings during the first hour and 
more than 100 forwardings during forty-eight hours after 
submission. This resulting dataset consists of 5670 microblogs 
and their cascades. 

To validate the prediction performance of the SEHP, we 
compare it with the state-of-the-art model based on rein¬ 
forced Poisson process [1], in terms of two metrics: 

• Mean Absolute Percentage Error (MAPE): It mea¬ 
sures the average derivation between the predicted and 
observed popularity over all microblogs. Denoting the 
predicted popularity for a microblog i up to time t as 

^http://www.wise2012.cs.ucy.ac.cy/challenge.html 


Ci{t) and its actual popularity as ri{t), the MAPE over 
M microblogs can be written as 


1 ^ 

MAPE = — V 
M ^ 


Cijt) - ri{t) 
ri{t) 


• Accuracy. It measures the fraction of microblogs, cor¬ 
rectly predicted under a given error tolerance e. Specif¬ 
ically, the accuracy of popularity prediction over M 
microblogs is 


Accuracy 



Ci{t) - ri{t) 
ri{t) 



The threshold e is set as 0.2 in this paper. 


We set the training period, i.e., T, as 6 hours and then 
predict the popularity for each microblog from the 1st to 
42nd hour after the training period. As shown in Fig. 2, the 
SEHP model consistently exhibits lower error and higher 
accuracy than the RPP model. 


4. CONCLUSIONS 

In this paper, we proposed a probabilistic model to charac¬ 
terize and predict the popularity dynamics of microblogs us¬ 
ing an SEHP. Experiments on a Sina Weibo dataset demon¬ 
strated that this model consistently outperforms the baseline 
model based on reinforced Poisson process. 
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